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Abstract. We continue to study the notion of cancellation-free linear 
circuits. We show that every matrix can be computed by a cancellation- 
free circuit, and almost all of these are at most a constant factor larger 
than the optimum linear circuit that computes the matrix. It appears 
to be easier to prove statements about the structure of cancellation-free 
linear circuits than for linear circuits in general. We prove two nontrivial 
superlinear lower bounds. We show that a cancellation-free linear circuit 
computing the n x n Sierpinski gasket matrix must use at least inlogn 
gates, and that this is tight. This supports a conjecture by Aaronson. 
Furthermore we show that a proof strategy for proving lower bounds 
on monotone circuits can be almost directly converted to prove lower 
bounds on cancellation-free linear circuits. We use this together with a 
result from extremal graph theory due to Andreev to prove a lower bound 
of Q(n _e ) for infinitely many n x n matrices for every e > for. These 
lower bounds for concrete matrices are almost optimal since all matrices 
can be computed with O f j|j^J gates. 

1 Introduction and Known Results 

Let F2 be the Galois field of order 2, and let F£ be the n-dimensional vector 
space over F2. A Boolean function / : F 2 l — ► F™ is said to be linear if there 
exists a Boolean m x n matrix A such that /(x) = j4x for every xgFJ. This is 
equivalent of saying that / can be computed using only XOR gates. 

An XOR-AND circuit C is a directed acyclic graph. There are n + 1 nodes 
with in-degree 0, called the inputs one of these is the constant value 1. All other 
nodes have in-degree 2 and are called gates. Every gate is labeled either © (XOR) 
or A (AND). There are m gates which are called the outputs; these are labeled 
yi, . . . , y m . The value of a gate labeled A is the product of its inputs (children), 
and the value of a gate labeled © is the sum of its two children (addition in F2 , 
denoted ©). The circuit C, with inputs x = (xi, . . . , x n ), computes the m X n 
matrix A if the output vector computed by C, y = (yi, . . . , y m ), satisfies y = Ax. 
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In other words, output yi is defined by the ith row of the matrix. The size of a 
circuit C, denoted \C\, is the number of gates in C. For simplicity, we will let 
m = n unless otherwise is explicitly stated. A circuit is linear if every gate is 
labeled ©. 

For relatively dense matrices, computing all the rows independently gives 
0(n) gates for each output, that is a circuit of size 0(n 2 ). It follows from a 
theorem by Lupanov |12I14| ) that this upper bound can be improved. 

Theorem 1 (Lupanov). Every n x n matrix can be computed using a circuit 
of size 



A counting argument shows, that this is asymptotically tight. In fact, the vast 
majority of matrices require this number of gates up to a constant factor. Despite 
this fact, there is no known concrete family of matrices requiring superlinear size 



Another, but related circuit model is the one where we allow unbounded 
fan- in and arbitrary gates (that is gates computing any predicate are allowed), 
but require bounded depth. The circuit complexity of such a circuit is the num- 
ber of wires. Here the lower bound situation is a little better; Alon, Karchmer 
and Wigderson [2] showed in 1990 that a particular family of matrices requires 
Q(n log n) wires for linear circuits in this model. This has recently been improved 
by Gal et al. |11| who have proven that a concrete infinite family of matrices 



|10) gave a survey of the strategies used for proving lower bounds on wire com- 
plexity for general (not necessarily linear) Boolean operators in bounded depth, 
and the limitations of these. 

Returning to the circuit model with bounded fan-in, the situation is even 
worse for general Boolean predicates. Here we know by a seminal result by 
Shannon |20|22| . that almost every function requires Q[2 n /n) gates, but again 
no superlinear bound is known for a concrete family of functions. A popular, and 
essentially the only known, technique for proving non-trivial linear lower bounds 
is the technique of gate-elimination. The key idea when using gate elimination 
is to set some of the inputs to constant values, arguing that a certain number of 
gates get "eliminated" and that this results in a function inductively assumed to 
have a certain size. Gate elimination was first used by Schnorr |19| to prove a 2n 
lower bound, and later improved by Paul [TB] and again by Blum [3] who in 1984 
presented a 3n lower bound for a family of functions when using the full binary 
basis. This is still the best concrete lower bound known [12]. For a description 
of the gate-elimination method see the survey of Boppana and Sipser [6] or the 
essay by Blum [5 J . In both of these it is mentioned that it is unlikely that the 
gate elimination method will ever yield superlinear lower bounds. 

In the case of general Boolean functions there are a number of functions con- 
jectured to have superlinear size, examples include any iVP-complete language. 
For linear operators there are, as far as the authors know, only few families of 
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matrices conjectured to have superlinear size. One of these include the Sierpinski 
gasket matrix, (Aaronson, personal communication and [lj) described later in 
this paper. 

One proof strategy for proving lower bounds is to prove lower bounds for a 
restricted circuit model, and to prove that sizes of circuits computing a function 
in the restricted circuit model are not too much larger than in the original 
model. This was essentially the motivation for looking at monotone circuits. 
In |18) . Razborov gave a superpolynomial lower bound for the Clique function 
for monotone circuits. The hope was at that time, that the monotone circuit 
complexity was polynomially related to general Boolean circuit complexity. This 
was disproven by Razborov in [17J, showing that the gap was superpolynomial. 
For more details, see e.g. [6]. 

2 Cancellation-free Linear Circuits 

For linear circuits, the value computed by every gate is the parity function of 
some subset of the n variables. That is, the output of every gate u can be 
considered as a vector k(u) in the vector space W%, where n(u)i — 1 if and only 
if Xi is a term in the parity function computed by the gate u. We call k(u) the 
value vector of u, and for input variables define n(xi) — that is the unit 
vector having the ith coordinate 1 and all other 0. It is clear by definition that 
if a gate u has the two children w, t, then k(u) = k(w) © K(t), where © denotes 
coordinate wise addition in F2. We say that a linear circuit is cancellation-free 
if for every pair of gates u,w where u is an ancestor of w then k(u) > k(w), 
where > denotes the usual coordinatewise partial order. That is, if xi is a term 
in a gate w it is a term in all subsequent gates. The intuition behind this is 
that if this condition is satisfied, the circuit never exploits the fact that in F2, 
a © a — 0. That is, things do not "cancel out" in the circuit. By definition, it 
is clear that any linear operator can be computed by a cancellation-free circuit. 
The proposition comes directly from the definition of cancellation-free 

Proposition 1. The following are equivalent: 

— C is cancellation- free 

— For every pair of vertices v\ , V2 in C , there do not exist two disjoints paths 
in C from v\ to V2 

— For every v where n(v)i — there is no path from Xi to v 

— C does not contain the triangle K3 as an undirected minor 

The notion cancellation- free was introduced by Boyar and Peralta in |7I8) . 
The paper concerns straight line program for computing linear forms, which is 
equivalent to the model studied in this paper. They proved that the problem 
of finding shortest linear circuits for linear operators is NP hard, even when 
restricted to cancellation-free circuits. They also noticed that most heuristics for 
constructing small linear circuits never exploit the cancellation property. Then, 
they constructed a gate minimizing heuristic that uses cancellation. 
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3 Relationship Between Cancellation-free Linear Circuits 
and General Linear Circuits 

Boyar and Peralta proved in [7] that there exists an infinite family of matrices 
where the sizes of cancellation- free circuits computing them are at least | — o(l) 
times larger than the optimum. We call this ratio the cancellation ratio, p(n). 
We can strengthen the lower bound to 2 using a surprisingly simple matrix. This 
construction is originally due to Svensson |21| . 

Theorem 2. There exists an infinite family of matrices such that any cancella- 
tion-free circuit computing them must have size 2 — o(l) times larger than the 
optimum. Thus pin) > 2 — o(l) 

Proof. Consider the n x n matrix: 
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If one allows cancellation this matrix can be computed by a circuit of size n, 
by first computing x\ © X2 to obtain yi- For 3 < j < n, adding Xj to yj-i gives 
yj. Thus, we use n — 1 gates to compute . . . y n . After that we can obtain j/i 
with one gate since y\ = y n © xj.. 

Consider any cancellation-free linear circuit C computing the matrix. Let the 
set S contain the gate computing y\ and all its (noninput) predecessors. Clearly 
\S\ > n — 2 since it is the sum of n — 1 terms. 

Notice that because C is cancellation- free, none of the gates in S can compute 
any of the output values yi , . . .y n . Therefore for every j > 1 we need at least 
one gate to compute yj. Thus one needs n — 1 extra gates for this part. This 
adds up to 2n — 3. And the ratio is therefore n proving the theorem. □ 

It turns out that for almost every matrix, the cancellation ratio is constant. 

2 

Lemma 1. If cancellation is allowed almost every n x n matrix needs 4 ^ - — 
°( log» ) 9 a ^ es t° oe computed. 

2 

Proof. The number of n x n matrices is 2™ . Since there are two inputs to each 
of the M gates, and each of the n outputs are either the output from a gate or 
an input (or zero), the number of circuits with n inputs, n outputs and M gates 
is at most 

(n + M) 2M (n + M+l) n /M\ 
Taking the logarithm one gets 

2Mlog(n + M) + nlog(n + M + 1) - log(M!) 
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Recalling that log(Ail) = M logM - O(M), for sufficiently large n, n < M: 



2M log(2M) + M log(2M) - M log M + 0{M) = 2M log M + O(M) 
so the number of distinct circuits is at most 2 2M1 °s M +°( M ). For < e < 1 the 



number of matrices that can be computed with M = (1 — e)-|n 2 /logn gates is 



at most 

q2M log M+0(M) < 2(l-e)" 2 +o(" 2 ) 

That is, the fraction of matrices not computable, is at least 

9(l-e)n 2 +o(n 2 ) 
1-- ^ • 



Since this limit tends to 1 almost every matrix has circuit size at least 



4 log 71 

□ 



We will now show that the construction in the proof of Theorem [T] produces 
a circuit that is cancellation-free. Before stating the lemma and its proof we will 
need a definition of rectangular decompositions: Given a Boolean n x n matrix 
A, the Boolean matrices B± , . . . , Bk constitute a rectangular decomposition if 
A = B\ + £>2 + • ■ • + Bk where addition is over the reals and every Bi has rank 1. 
We say that the weight of Bi is the number of nonzero columns plus the number 
of nonzero rows. The weight of a rectangular decomposition is the sum of the 
weights of the B^s. Lupanov showed in |14) (see also |12| ) that every nxn matrix 
admits a rectangular decomposition of weight (1 + °(1)) 1^777- 

Lemma 2. Every nxn matrix can be computed by a cancellation-free linear 
circuit of size (1 + 0(1))^-. 

Proof. Let the Boolean nxn matrix A be arbitrary. Consider the rectangular 
decomposition B\, . . . ,B^ assumed to exist by Lupanov's theorem. For each i 
let Ci (r,) denote the number of nonzero columns (rows) in Bi. Add for each Bi 
the inputs corresponding to the nonzero columns, using Cj — 1 gates. Call the 
result Si. Now each output is a sum of Si's. For each yj, add these s,'s. In total, 
this takes at most Ti gates. The total number of gates is at most 

5> j; -l) + ]Tr 3 < (l+o(l)) 



logn 

Since the addition B\ + . . . + Bk in the the rectangular decomposition is over 
the reals, the circuits is cancellation-free. □ 

Combining the two lemmas we get the following: 

Theorem 3. For almost every matrix, the cancellation ratio, p(ri), is constant. 
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4 Lower Bound on the Size of Cancellation-free Circuits 
Computing the Sierpinski Gasket Matrix. 

In this section we will prove that the nxn Sierpinski gasket matrix needs log n 
gates when computed by a linear cancellation-free circuit, and that this suffices. 

Suppose some subset of the input variables are restricted to the value 0. 
Now look at the resulting circuit. Some of the gates will now compute the value 
z = ® w. In this case, we say that the gate is eliminated since it no longer does 
any computation. The situation can be even more extreme, some gate might 
"compute" z — 0. In both cases, we can remove the gate from the circuit, 
and forward the input if necessary (if z is an output gate, w now outputs the 
result). In the second case, the parent of z will get eliminated, so the effect 
might cascade. For any subset of the variables, there is a unique set of gates that 
become eliminated when setting these variables to 0. 

The Sierpinski gasket matrix is defined recursively as: 

So = (1) 
o _ (S k 

•Jfc+l — \ Q Q 

In all of the following let n = 2 fe , and let Sk be the nxn Sierpinski gasket 
matrix. First we need a fact about S fe : 

Proposition 2. For every k the determinant of the Sierpinski gasket matrix is 
1 . In particular the 2 k rows in Sk are linearly independent. 

Proof. The determinant of an augmented matrix is given by the formula: 
det(S k +i) = det (J °J - det(Sk)det(S k ) = 1 

□ 

Theorem 4. For every k > 2, any cancellation-free circuit that computes the 
nxn Sierpinski gasket matrix has size at least ^nlog 2 ra. 

Proof. The proof is by induction on k. For the base case, look at the 2x2 matrix 
Si. This clearly needs at least 2-21og 2 2 = 1 gate. 

Suppose the statement is true for some k, now look at the 2n x 2n matrix 
Sfe + i. Denote the output gates j/i, . . . , y<i n and the inputs X\, . . . , x-i n . Partition 
the gates of C into three disjoint sets, Ci, Ci and C3 defined as follows: 

— C\\ The gates having only inputs from x\, . . . ,x n and C\. Equivalently the 
gates not reachable from inputs x n+ i, . . . , x 2rl . 

— C2 : The gates in C — C\ that are not eliminated when inputs 
set to 0. 

— C3: C — (Ci U Ci). That is, the gates in C — C\ that do become eliminated 
when inputs x\, . . . , x n is set to 0. 

Obviously C = Ci + C2 + |C 3 |. We will now give lower bounds on the sizes 
of Ci, C2, and C3. 
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C\ : Since the circuit is cancellation- free, the outputs y\, . . . , y n and all their 
predecessors are in G\, By the induction hypothesis, \C\\ > ^n\og 2 n. 

Ci: Since the gates in Ci are note eliminated when x\ they 
compute Sk on the inputs x n j r \ 1 . . . ,X2n- By the induction hypothesis | C 2 1 > 
|nlog 2 n. 

C3: The goal is to prove that this set has size at least n. Let <5(Ci) be the set of 
arcs from C\ U {sci, . . . ,x n } to B = C2 U C3. We first prove that 

10*1 > |<5(0)| (1) 

By definition, all gates in C\ attain the value when set to 0. Let 

(v, w) s 5{C\) be arbitrary. Since v £ C\ U {xi, . . . , x n }, to becomes eliminated, 
so w £ C3. Every u 6 C3 can only have one child in Ci, since no gate in C3 can 
have two children in C\. So | C 3 1 > |J(Ci)|. 

We now show that |<5(Ci)| > n. Let the endpoints of S(Ci) in C\ be ei, . . . , e p 
and let their corresponding value vectors be Ui, . . . ,v v . 

Now look at the value vectors of the output gates y n +i, ■ • ■ , yin- F° r each of 
these, the first vector consisting of the first n coordinates must be in span(vi, . . . , v p ), 
but the dimension of Sk must is n, so p > n. 

We have that |C 3 | > \6(d)\ > n, so 

\C\ = \Ci\ + \C 2 \ + |C 3 | > ^nlog 2 n + ^n\og 2 n + n= i(2n) log 2 (2n). 

□ 

It turns out that this is tight. 

Proposition 3. The Sierpinski matrix can be computed by a cancellation-free 
circuit using 2 -nlog 2 n gates. 

Proof. This is clearly true for S2. Assume that Sk can be computed using 
2-nlog 2 n gates. Consider the matrix Sk+i- Construct the circuit in a divide and 
conquer manner; construct recursively on variables x%, . . . , x n and x n +i, . . . , x^n- 
This gives outputs yi,... ,y n . After this use n operations to finish the outputs 
y n +i, ■ ■ ■ yin- This adds up to exactly \(^n) log 2 2n. □ 

5 Stronger Lower Bounds 

In |15j . Mehlhorn proved lower bounds on monotone circuits for computing 
"Boolean sums". The same proof strategy can be used to prove lower bounds 
on cancellation- free linear circuits. For a matrix A, denote by Cf(A) the smallest 
cancellation-free linear circuit that computes A, and \A\ as the number of l's in 
A. Let K a ^ be the complete bipartite graph with a vertices in one vertex set 
and b in the other. 



7 



Theorem 5. Let M be an n x n matrix. Interpret M as a vertex adjacency 
matrix for a bipartite graph in the natural way. If this graph does not contain 
Kh+i,k+i for constants h,k then \cf(M)\ £ J?(|M|). 

Proof. Consider the class of cancellation-free linear circuits where all sums of at 
most k variables are available for free. Let c/(M) be smallest of such circuits 
computing M. Obviously |c/(M)| > |c/(M)|. Since all sums of at most k vari- 
ables are available for free, anything computed at a gate in c/ (M) is a sum of at 
least k + 1 variables. Since the circuit is cancellation- free, for a gate u in c~t(M), 
its value vector will never decrease, hence the value vector of a successor to u 
will have 1 on the k + 1 coordinates that it's value vector has. In particular, since 
the matrix does not contain Kh+i.k+i, this means that any gate u in c~/(M) can 
have a path to at most h outputs. 

For a fixed row i, the cost of computing it is at least 

\Mi\/k-l. 

And since a gate has a path to at most h outputs, if we sum over all rows we 
count each gate at most h times. So the total size of c/(M) is at least 

Y^{\Mi\/k-l)/heQ{\M\) 

i 

□ 

Now, proving lower bounds for linear cancellation-free circuits is reduced to 
the problem of finding dense bipartite graphs not containing Kh+i.k+i- This 
problem is known as the Zarankiewicz problem. 

Corollary 1. For any e > 0, there exists a concrete family of matrices that 
requires f2(n 2 ~ e ) gates when computed by a cancellation-free linear circuit. 

Proof. In [3] , Andreev gave for every e > an explicit construction for an infinite 
family of bipartite graphs with 2n nodes and n 2_e edges that does not contain 
the subgraph Kh+i t k+i where h and k only depend on e. Using this construction 
together with Theorem [5] gives the desired result. □ 

It should be noted that Brown [9 gave a simpler construction of a family of 
graphs with 0(n) vertices and (9(n 5 / 3 ) edges not containing K3 3. Also, Kollar 
et al. p~3J gave a construction similar to Andreev's, but where the functions h, k 
grow slower than in Andreev's construction. 

6 Conclusion and Open Problems 

What is the value of p(n)1 If for some 8 > 0, p{n) G 0(n 1_<5 ), Corollary[T] provides 
an unconditional superlinear lower bound for a concrete family of matrices. 

In the proof of Theorem [4j we did not use the cancellation- free property as 
extensively as we did in the proof of Theorem [5l We only used that there is no 
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path from x n +i, ■ ■ ■ ,X2 n to the outputs y\, . . . ,y n . Another strategy to prove 
an unconditional lower bound on the size of circuits computing the Sierpinski 
matrix could be to prove that for any optimal circuit no such path exists. Then 
the theorem would follow, even with cancellations. 
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